2023 iThome 鐵人賽

DAY 29

SideProject30

python基礎及數據科學之應用系列第 29 篇

python基礎及數據科學之應用day 29[聊天機械人]

15th鐵人賽

carsonleung

團隊沙培小子

2023-10-14 22:30:56

435 瀏覽

分享至

今天是29天

應該是這個主題最後關於技術內容的文章，說今天的內容其實我一早已經想好，這是參考一下外國YouTuber所改良的程式，我新增的大概就事輸入資料，那麼便開始。(本來我想連接到discord，但有點複雜，不足以在一天內講完所以便沒有加入了。)

什麼是聊天機器人:

聊天機器人可以與使用者對話。它利用自然語言處理 (NLP) 技術和演算法來理解和產生類似人類的回應。
(這裏我沒有使用tensorflow的模型或其他演算法，只是一些基本的加減乘除。)

這是一個json資料庫，我把它命名為`data2.json`

{
  "all_data": [
    {
      "tag": "Hello",
      "responses": [
        "Hello!"
      ],
      "patterns": [
        "hello",
        "hi",
        "hey",
        "sup",
        "heyo"
      ],
      "keyword": []
    },
    {
      "tag": "bye",
      "responses": [
        "Hope to see you again"
      ],
      "patterns": [
        "bye",
        "goodbye"
      ],
      "keyword": []
    },
    {
      "tag": "greeting",
      "responses": [
        "I am doing fine, and you?"
      ],
      "patterns": [
        "bye",
        "goodbye"
      ],
      "keyword": []
    },
    {
      "tag": "thank",
      "responses": [
        "You are welcome"
      ],
      "patterns": [
        "thanks",
        "thank",
        "helpful"
      ],
      "keyword": []
    },
    {
      "tag": "eat",
      "responses": [
        "I don't like eating anything because I'm a bot obviously!"
      ],
      "patterns": [
        "what",
        "you",
        "eat"
      ],
      "keyword": [
        "you",
        "eat"
      ]
    },
    {
      "tag": "advice",
      "responses": [
        "If I were you, I would go to the internet and type exactly what you wrote there!"
      ],
      "patterns": [
        "give",
        "advice"
      ],
      "keyword": [
        "advice"
      ]
    },
    {
      "tag": [
        "test"
      ],
      "responses": [
        "200"
      ],
      "patterns": [
        "test"
      ],
      "keyword": []
    },
    {
      "tag": [
        "Deep_Learning"
      ],
      "responses": [
        "relate"
      ],
      "patterns": [
        "deep",
        "learn",
        "relat",
        "ai"
      ],
      "keyword": []
    },
    {
      "tag": [
        "My_name"
      ],
      "responses": [
        "My name is z2"
      ],
      "patterns": [
        "name"
      ],
      "keyword": [
        "Your"
      ]
    }
  ]
}

這對的資料庫看起來很長，但其實只有幾項，之後再進行訓練時，會不斷增加資料。
去分割資料庫只是為了方便閱讀，responses就是對話回答的答案，patterns就是從對話中搜尋是否有以下的字，keyword就是對話中必須含有那個字詞才會作出相對的回應，而tag只是方便你去閱讀大概這個裏邊是什麼，今天的專案不會用到。
如果不清楚json的小夥子可以去看看我之前的文章

範例:

(這次用到的程式碼有點長)

import re
import json
import random

import nltk
import string
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize


def stop_words_and_tokenize(text:str):
    words=word_tokenize(text)
    stop_words = set(stopwords.words('english'))
    punctuation = set(string.punctuation)
    filtered_words = [word.lower() for word in words if word.lower() not in stop_words and word.lower() not in punctuation]
    ps = PorterStemmer()
    stem_word= map(ps.stem,filtered_words)
    return list(stem_word)

def load_data(file_path:str):
    data= open(file_path,"r")
    data: dict = json.load(data)
    data=data["all_data"]
    return data

def unknown():
    response = ["Could you please re-phrase that? ",
                "Sounds about right.",
                "What does that mean?"][
        random.randrange(3)]
    newknowledge= input("What does it mean? can you teach me: ")
    respon =input("How can I responses : ")
    new_words=stop_words_and_tokenize(prompt_text)
    item={"tag":[newknowledge],"responses": [respon],"patterns":new_words,"keyword":[]}
    config = json.loads(open('data2.json').read())
    config["all_data"].append(item)
    with open('data2.json','w') as f:
        f.write(json.dumps(config,indent=2))
    return("thank you")

def message_probability(user_message, recognised_words, single_response=False, required_words=[]):
    message_certainty = 0
    has_required_words = True

    #計算每條預定義訊息中存在的單字數
    for word in user_message:
        if word in recognised_words:
            message_certainty += 1

    # 計算用戶訊息中已識別單字的百分比
    percentage = float(message_certainty) / float(len(recognised_words))

    #檢查字串中是否包含單字
    for word in required_words:
        if word not in user_message:
            has_required_words = False
            break

    if has_required_words or single_response:
        return int(percentage * 100)
    else:
        return 0


def check_all_messages(message):
    highest_prob_list = {}
    
    #過濾後將其添加到字典中
    def response(bot_response, list_of_words, single_response=False, required_words=[]):
        nonlocal highest_prob_list
        highest_prob_list[bot_response] = message_probability(message, list_of_words, single_response, required_words)

    n=load_data("data2.json")
    for i in n:
        response(i["responses"][0], i["patterns"], required_words= i["keyword"])

    best_match = max(highest_prob_list, key=highest_prob_list.get)
    print(highest_prob_list)
    print(f'Best match = {best_match} | Score: {highest_prob_list[best_match]}')

    return unknown() if highest_prob_list[best_match] < 1 else best_match

#用於獲取回應
def get_response(user_input):
    split_message = re.split(r'\s+|[,;?!.-]\s*', user_input.lower())
    response = check_all_messages(split_message)
    print(split_message)
    return response

#進行訓練
while True:
    prompt_text=input("You:")
    print(str('Bot: ' + get_response(prompt_text)))

執行結果:

你們可以自行嘗試把資料庫完有的內容輸入或輸入其他去試試他會不會記錄，由於只是普通的對話，這裏不放圖片了

接着便解釋這段程式碼

第一至十行

導入所需的資料庫

`stop_words_and_tokenize`

對輸入的對話進行簡化

`stopwords.words('english')`

一些句子沒有太重要意思的字，進行過濾

`ps.stem`

把全部的單字轉為一個基本形識，以方便待會的分類

`load_data`

先設定一個函數，載入所有資料，沒有太複雜的地方，可以自行理解

`unknown`

遇到不清楚的輸入時，作出有禮貌的回應，並會把它載入到資料庫，要求輸入者作出解釋及如何回答，在下次重啟程式碼時便會有那個輸入的知識，能不斷提升這個資料庫

`message_probability`

就一個非常簡單的演算法，只是透過句子中有沒有那個詞語去確定，有的話便增加他的分數，計算最高分數。

`check_all_messages`

輸出最佳分數的項目

`highest_prob_list`

迴傳一個列表，有各種的問題的相似機率，選擇最高機率的作回應

`get_response`

把輸入的內容去除標點符號，並把全部轉為小階的英文字母

`re.split`

先去除標點符號，之後把每個單字分開，以作之後的過濾

`while True`

呼叫函數，令整個聊天功能運行

說真的，今天內容我整理會很久，我也認為十分有趣及容易入門，如果覺得我的文章對你有幫助或有更好的建議，可以追蹤我和不妨在留言區提出，我們明天再見。

refernece:
https://www.youtube.com/watch?v=CkkjXTER2KE

python基礎及數據科學之應用day 28[pycord資料庫介紹5機械人進入語音聊天室]

python基礎及數據科學之應用day 30[Qrcode資料庫及結語]

系列文

python基礎及數據科學之應用共 30 篇

RSS系列文訂閱系列文

6 人訂閱

完整目錄

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22198 篇

完賽人數

602 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙

python基礎及數據科學之應用系列 第 29 篇

python基礎及數據科學之應用day 29[聊天機械人]

今天是29天

什麼是聊天機器人:

這是一個json資料庫，我把它命名為data2.json

範例:

執行結果:

接着便解釋這段程式碼

第一至十行

stop_words_and_tokenize

stopwords.words('english')

ps.stem

load_data

unknown

message_probability

check_all_messages

highest_prob_list

get_response

re.split

while True